Fault Tolerant Distributed Information Systems
نویسندگان
چکیده
Critical infrastructures provide services upon which society depends heavily; these applications are themselves dependent on distributed information systems for all aspects of their operation and so survivability of the information systems is an important issue. Fault tolerance is a key mechanism by which survivability can be achieved in these information systems. We outline a specification-based approach to fault tolerance, called RAPTOR, that enables systematic structuring of fault tolerance specifications and an implementation partially synthesized from the formal specification. The RAPTOR approach consists of three specifications describing the faulttolerant system, the errors to be detected, and the actions to take to recover from those errors. System specification utilizes an object-oriented database to store the descriptions associated with these large, complex systems, while the error detection and error recovery specifications are defined using the formal specification notation Z. We also describe a novel implementation architecture and explore our solution through the use of two case study applications.
منابع مشابه
A generalized ABFT technique using a fault tolerant neural network
In this paper we first show that standard BP algorithm cannot yeild to a uniform information distribution over the neural network architecture. A measure of sensitivity is defined to evaluate fault tolerance of neural network and then we show that the sensitivity of a link is closely related to the amount of information passes through it. Based on this assumption, we prove that the distribu...
متن کاملSynthesis of Fault-Tolerant Distributed Systems
A distributed system is fault-tolerant if it continues to perform correctly even when a subset of the processes becomes faulty. Faulttolerance is highly desirable but often difficult to implement. In this paper, we investigate fault-tolerant synthesis, i.e., the problem of determining whether a given temporal specification can be implemented as a fault-tolerant distributed system. As in standar...
متن کاملA Replicated Monitoring Tool
Modeling the reliability of distributed systems requires a good understanding of the reliability of the components. Careful modeling allows highly fault-tolerant distributed applications to be constructed at the least cost. Realistic estimates can be found by measuring the performance of actual systems. An enormous amount of information about system performance can be acquired with no special p...
متن کاملTowards Modeling and Model Checking Fault-Tolerant Distributed Algorithms
Fault-tolerant distributed algorithms are central for building reliable, spatially distributed systems. In order to ensure that these algorithms actually make systems more reliable, we must ensure that these algorithms are actually correct. Unfortunately, model checking state-ofthe-art fault-tolerant distributed algorithms (such as Paxos) is currently out of reach except for very small systems....
متن کاملDistributed Adaptive Fault-Tolerant Consensus Control of Multi-Agent Systems with Actuator Faults
This paper presents an adaptive fault-tolerant control (FTC) scheme for leader-follower consensus control of uncertain mobile agents with actuator faults. A local FTC component is designed for each agent in the distributed system by using local measurements and certain information exchanged between neighboring agents. Each local FTC component consists of a fault detection module and a reconfigu...
متن کاملThe Design of Real-Time Distributed information Systems with Object-Oriented and Fault-Tolerant Characteristics
For real-time distributed information systems (RT-DIS), a high degree of reliability, availability and safety is usually required. In this paper, we rst discuss the major features that a design methodology for RTDIS should possess. A survey of current design approaches is then given. The G-Net methodology for the design of RTDIS with object oriented and fault tolerance characteristics is presen...
متن کامل